Quantifying Causal Coupling Strength: 
A Lag-specific Measure For Multivariate Time Series Related To Transfer Entropy 
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While it is an important problem to identify the existence of causal associations between two components 
of a multivariate time series, a topic addressed in [J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, Physical 
Review Letters 108, 258701 (2012)], it is even more important to assess the strength of their association in a 
meaningful way. In the present article we focus on the problem of defining a meaningful coupling strength using 
information theoretic measures and demonstrate the short-comings of the well-known mutual information and 
transfer entropy. Instead, we propose a certain time-delayed conditional mutual information, the momentary 
information transfer (MIT), as a measure of association that is general, causal and lag-specific, reflects a well 
interpretable notion of coupling strength and is practically computable. Rooted in information theory, MIT is 
general, in that it does not assume a certain model class underlying the process that generates the time series. 
As discussed in a previous paper [J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, Physical Review Letters 
108, 258701 (2012)], the general framework of graphical models makes MIT causal, in that it gives a non- 
zero value only to lagged components that are not independent conditional on the remaining process. Further, 
graphical models admit a low-dimensional formulation of conditions which is important for a reliable estimation 
of conditional mutual information and thus makes MIT practically computable. MIT is based on the fundamental 
concept of source entropy, which we utilize to yield a notion of coupling strength that is, compared to mutual 
information and transfer entropy, well interpretable, in that for many cases it solely depends on the interaction 
of the two components at a certain lag. In particular, MIT is thus in many cases able to exclude the misleading 
influence of autodependency within a process in an information-theoretic way. We formalize and prove this idea 
analytically and numerically for a general class of nonlinear stochastic processes and illustrate the potential of 
MIT on climatological data. 

PACS numbers: 89.70.Cf, 02.50.-r, 05.45.Tp, 89.70.-a 



I. INTRODUCTION 



Today's scientific world produces a vastly growing and 
technology-driven abundance of data across all research fields 
from observations of natural processes to economic data 1 1 1. 
To test or generate hypotheses on interdependencies between 
processes underlying the data, statistical measures of asso- 
ciation are needed. Recently, Reshef et al (21 put forward 
two key demands such a measure should fulfill in the bivari- 
ate case: (1) generality, i.e., the measure should not be re- 
stricted to certain types of associations like linear measures, 
and (2) equitability, which means that the measure should 
reflect a certain heuristic notion of coupling strength, i.e., it 
should give similar scores to equally noisy dependencies. The 
latter is especially important for comparisons and ranking of 
the strength of dependencies. In this article we generalize 
this idea to multivariate data as needed to reconstruct interac- 
tion networks in the fields of neuroscience, genetics, climate, 
ecology and many more. For the multivariate case we pro- 
pose to add two more basic properties: (3) causality, which 
means that the measure should give a non-zero value only to 
the dependency between lagged components of a multivariate 
process that are not independent conditional on the remaining 
process. (4) coupling strength autonomy, implying that also 
for dependent components we seek for a causal notion of cou- 
pling strength that is well interpretable, in that it is uniquely 
determined by the interaction of the two components alone 
and in a way autonomous of their interaction with the remain- 
ing process. To understand this, consider a simple example: 
Suppose we have two interacting processes X and Y and a 



third process Z, that drives both of them. Then a bivariate 
measure of coupling strength between X and Y will be influ- 
enced by the common input of Z, while our demand is, that 
the measure should be autonomous of the interactions of X 
and Y with Z. In an experimental setting this corresponds to 
keeping Z fixed and solely measuring the impact of a change 
in X on F averaged over all realizations of Z. This property 
can be regarded as one ingredient of a multivariate extension 
of the equitability property. Last, we also demand that the 
measure should be defined in a way that is practically com- 
putable, in that the estimation does not, e.g., require some- 
what arbitrary truncations like in the case of transfer entropy 
|3 |. Due to these properties our approach can be used to re- 
construct interaction networks where not only the links are 
causal, but are also meaningfully weighted and have the at- 
tribute of a coupling delay. This serves as an important feature 
in inferring physical mechanisms from interpreting interaction 
networks. 

The first requirement, generality, is fulfilled by any infor- 
mation theoretic measure like mutual information (Ml) and 
conditional mutual information (CIMI) |4|. These measures 
also fulfill the axioms for dependency measures proposed in 
|5 1. Additionally to generality, the authors in |2| demonstrate 
that their algorithmically motivated maximal information co- 
efficient fulfills the property of equitability. However, apart 
from issues with statistical power |6|, a crucial drawback of 
their measure is, that it is not clear how to extend it to the 
multivariate case. There are few works considering a concept 
of coupling strength in the multivariate context of causality. 
In (71 [H this problem is approached in the linear framework 
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Figure 1: (Color online) TE and DTE for a multivariate example process as given by Eq. (\3\ that will be analytically analyzed in Sect. [V] 



The time series graph is defined in Sect. Ill (a) depicts the TE between the infinite past vector and Yt (black dots) conditioned on 
the remaining infinite past ^^\X^ (gray dashed open box), (b) illustrates the first three summands of DTE given by Eq. (jsj. For the 
CMI between Xt-r and Yt (black dots) only the finite set SYt,Xt-r- (^^^ solid boxes) is needed to satisfy the Markov property (Eq. (2) 
in |[T3l ). SYt,Xt-r- C X7\-^t~ U X^_^ (gray dashed open box) must be chosen so that it separates the remaining infinite conditions 
{'X^\X^ U X^_^)\SYt,Xt-^ from Yt in the graph (for a formal definition of paths and separation see |20|). Since the separating sets depend 
on paths between X7\-^t~ U X^_^ and Yt, they can only be determined after the time series graph has been estimated. 



of partial directed coherence and in O [TOl using the less re- 
stricted, yet still model-based, concept of Granger causality, 
all sharing the problem that the model might be mis speci- 
fied. Transfer entropy (TE) O is the information-theoretic 
analogue of Granger causality ifTTl and the issue of arbitrary 
truncations has been addressed in fT2) \ and in our previous 
article f\3l . Still the problem with TE is that it is not lag- 
specific which can lead to false interpretations like in the case 
of feedbacks L14 I and, as we will demonstrate analytically and 
numerically in this article, it is not uniquely determined by 
the interaction of the two components alone and depends on 
misleading effects of, e.g., autodependency and the interac- 
tion with other processes. In essence, it does not fulfill the 
proposed property of coupling strength autonomy. In fTT] an- 
other information-theoretic approach, based on a different set 
of postulates, is discussed. 

Our approach to a measure of a causal coupling strength 
is based on the fundamental concept of source entropy UTEj 
and for the special case of bivariate ordinal pattern time se- 
ries the momentary information transfer (MIT) has been in- 
troduced recently in ifTTl . In this article we utilize the concept 
of graphical models to mathematically formalize and general- 
ize MIT to the multivariate case. We demonstrate that MIT 
is practically computable and fulfills the properties of gen- 
erality, causality and coupling strength autonomy, while the 
more complex property of equitability will only partially be 
addressed here. 

The determination of a causal coupling strength in our ap- 
proach is a two-step process. In the first step the graphical 
model is estimated as detailed in 1 13 | which determines the 
existence or absence of a link and thus of a causality between 
lagged components of the multivariate process. The second 
step - the main topic of the present paper - is the estimation 
of MIT as a meaningful weight for every existing link in the 
graph. 

The article is organized as follows. In Sect. [ll|we define 
and review TE and the decomposed transfer entropy intro- 
duced in ifTSl . In Sect. |lll|we introduce the important con- 
cept of graphical models and in Sect. IV we define MIT and 
related measures. All of these measures are compared ana- 
lytically (Sect.jVj), leading to the coupling strength autonomy 



theorem (Sect. [Vl|), and numerically (Sect. VII). Finally, we 



discuss limitations (Sect. VIII ) and provide an application to 
climatological data that shows the potential of our approach 
(Sect.[IX|). The appendices provide proofs and further discus- 
sions. 



II. TRANSFER ENTROPY AND THE CURSE OF 
DIMENSIONALITY 

Before introducing MIT, we will discuss the well-known 
TE and its short-comings. We will focus on multivariate 
time series generated by discrete-time stochastic processes 
and use the following notation: Given a stationary multi- 
variate discrete-time stochastic process X, we denote its uni- 
or multivariate subprocesses X, Y^Z^W,... and the random 
variables at time t as , , — Their pasts are defined as 
X- = (X,_i,X,_2,...)andX,- = X,_2, ^. .)• For 

convenience, we will often treat X, X^, X^ , and X^ as sets 
of random variables, so that, e.g., XT can be considered a 
subset of X^ . Now the TE [see Fig.[T|a)] 



rTE 



I{X-;Y,\X-\X-) 



(1) 



is the reduction in uncertainty about Yt when learning the past 
of Xf, if the rest of the past of X^, given by X^\X^~, is 
already known (where "\" denotes the subtraction of a set). 
Note that, because of the assumed stationarity, Ix%y is in- 
dependent of t. TE measures the aggregated influence of X 
at all past lags and is not lag-specific. The definition of TE 
leads to the problem that infinite-dimensional densities have 
to be estimated, which is commonly called the "curse of di- 
mensionality". In the usual naive estimation of TE the infinite 
vectors are simply truncated at some Tmax leading to 



I{X, 



0. 



rjx: 



(^-i,...,t- 



\Xt)- 
(2) 



where X, 



(Xt_i, ...,Xt_^_J (corre- 
spondingly for X) and Tmax has to be chosen at least as large 
as the maximal coupling delay between X and F, which can 
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lead to very large dimensions. In our numerical experiments 
we will demonstrate that the choice of a truncation lag Tmax, 
which affects the estimation dimension yisi D = N ■ Tmax + 1 
(where N is the number of processes), has a strong influence 
on the value of TE and affects the reliability of causal infer- 
ence. This is a huge disadvantage because the coupling de- 
lay should not have an influence on the measured coupling 
strength. 

In |[T3l the problem of high dimensionality is overcome by 
utilizing the concept of graphical models that will be intro- 
duced in the next section. In this framework a decomposed 
transfer entropy (DTE) is derived that enables an estimation 
using finite vectors 



rTE 



rDTE 

^X^Y 



T=l 



I{Xt-r;Yt\SY,,X,..) (3) 



for a certain finite set SYt,Xt—r ^ ^7\^t' ^ ^t~-T [^^^ 
Fig. [ijb)] and with chosen as the smallest r for which 
the estimated remainder is smaller than some given thresh- 
old. Another approach to find a truncation is described in (12]. 
While thereby the somewhat arbitrary truncation lag Tmax is 
avoided and the estimation dimension is drastically reduced, it 
can still be quite high (in the still rather simple model example 
of |[T3l the maximum dimension was 24). 

The summands in Eq. ^ can be seen as the contributions 
of different lags to TE, but should not be interpreted as lag- 
specific causal contributions because they can be non-zero 
also for lags r for which there is no link in the graph. Finally, 
apart from the issue of high dimensionality and lag-specific 
causality, we will demonstrate in Sect.|V]that TE or DTE also 
do not fulfill the proposed coupling strength autonomy prop- 
erty. In the next section we introduce the important concept 
of graphical models from which we derive MIT and related 
measures. 



III. GRAPHICAL MODELS AND CAUSALITY 

In the graphical model approach I iBHSOl the conditional in- 
dependence properties of a multivariate process are visualized 
in a graph, in our case a time series graph. This graph thus 
encodes the lag-specific causality with respect to the observed 
process. As depicted in Figs. [T] and [2jb), each node in that 
graph represents a single random variable, i.e., a subprocess, 
at a certain time t. Nodes Xt-r and Yt are connected by a 
directed link "Xt_r Yt' pointing forward in time if and 
only if r > and 



rLINK 
^X^Y 



(r) = I{Xt-r;Yt\X- \ {Xt-r}) > 0, (4) 

i.e., if they are not independent conditionally on the past of 
the whole process, which implies a lag-specific causality with 
respect to X. IfY^X we say that the link "Xt-r ^ Yt' 
represents a coupling at lag r, while for F = X it represents 
an autodependency at lag r. Nodes Xt and Yt are connected 
by an undirected contemporaneous link (visualized by a line) 
ISOl if and only if 



I^'^^^I{Xt;Yt\X;^MXuYt})>0 



(5) 



where also the contemporaneous present Xt\{X^,y^} is in- 
cluded in the condition. In the case of a multivariate autore- 
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Figure 2: (Color online) (a) Venn diagram that depicts the entropy 
H{Y) at time t (omitting t and r in the labels) as a segmented 
column bar. It is composed of the source entropy H{Y\Vy) (dark 
gray shaded) and parts of the source entropy H{X\Vx) (light gray 
shaded), the entropy H{Vx) of the parents of X (red), and the en- 
tropy H{VY\{Xt-T}) of the remaining parents of Y (blue). Our 
CMI Ix^^Y (solid framed segment) is the difference between the en- 
tropy H(Y\Vy\{X},Vx) (dashed segment) that includes transfer 
from X and the source entropy of Y that excludes it. (b) shows an 
example of a time series graph (see definition in text) correspond- 
ing to Eq. ( [46| that makes the intuitive entropy picture operational. 
In this graph MIT is the CMI between Xt-r Sit r = 2 and Yt 
(marked by the black dots) conditioned on the parents Vxt_^ (red) 
and VYA{Xt-r} (hluQ). 



corresponds to non-zero entries in the inverse covariance ma- 
trix of the innovations e. Note that stationarity implies that 
''Xt-r whenever "X^/_r ^t'" for any f. 

Like TE, the CMIs given by Eq. ^ and ([5| involve infinite- 
dimensional vectors and can thus not be directly computed, 
but only involving truncations. As shown in Sect. VII this 



As shown in Sect. 

measure therefore suffers from the problem of high dimen- 
sionality and also theoretically does not fulfill the coupling 
strength autonomy property as analyzed in Sect. [V] 

On the other hand, one can exploit the Markov property and 
use the finite set of parents defined as 



Vy, = {Zt-r : Z G X, r > 0, 



Yt} (6) 



gressive process as defined later in Eq. (40), this definition 



of Yt [blue box in Fig.[2jb)] which separate Yt from the past of 
the whole process X^ \PYt • The parents of all subprocesses 
in X together with the contemporaneous links comprise the 
time series graph. In 1 13] an algorithm for the estimation of 
these time series graphs by iteratively inferring the parents is 
introduced. In the Supplementary Material of |13 | we also 
describe a suitable shuffle test and a detailed numerical study 
on the detection and false positive rates of the algorithm. The 
Markov properties hold for models sufficing the very general 
condition (S) in |20|. 

The determination of a causal coupling strength now is a 
two-step procedure. In the first step the time series graph is 
estimated as detailed in 1 13] which determines the existence 
or absence of a link and thus of a causality between lagged 
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components of X. The second step is the determination of a 
meaningful weight for every existing Hnk in the graph. The 
MIT introduced in the next section is intended to serve this 
aim by attributing a well interpretable coupling strength solely 
to the inferred links of the time series graph. 



IV. MOMENTARY INFORMATION TRANSFER AND 
SOURCE ENTROPY 

The parents of a subprocess F at a certain time t are key 
to understand the underlying concept of source entropy. Each 
univariate subprocess X of a stationary multivariate discrete- 
time stochastic process X will at each time t yield a realiza- 
tion Xt. The entropy of Xt measures the uncertainty about Xt 
before its observation, and it will in general be reduced if a 
realization of the parents Vxt ^ ^7 known. But for a non- 
deterministic process, and most real data will at least contain 
some random noise, there will always be some "surprise" left 
when observing Xf. This surprise gives us information and the 
expected information is called the source entropy H{Xt\Vxt ) 
of X. Now the MIT between X at some lagged time t — r in 
the past and Y at time t is the CMI that measures the part of 
source entropy of Y that is shared with the source entropy of 
X: 



i{x,_r;Y,\VYA{x,_r}.Vx,_.) 

H{Yt\VYMXt-r}.Vx._.)-H{Yt\VY,) 



(7) 



This approach of "isolating source entropies" is sketched in a 
Venn diagram in Fig. [2|a). The attribute momentary 1171 is 
used because MIT measures the information of the "moment" 
t — T in X that is transferred to Yt. This "momentariness" is 
closely related to the property of coupling strength autonomy 
as we will show in the next sections. Similarly to the defini- 
tion of contemporaneous links in Eq. ([5]), we can also define a 
contemporaneous MIT 



MIT 
X-Y 



V{MxMYt}),V{AfYA{Xt})) (8) 



where AT denotes the contemporaneous neighbors given by 

MY^={Xt:X eX,Xt-Yt} (9) 

and correspondingly for X and their parents. Due to Markov 
properties the contemporaneous MIT is equivalent to the for- 
mula defining contemporaneous links Eq. ([5]). This is, how- 
ever, not the case for the lagged MIT. Like any (C)MI, MIT 
is sensitive to any kind of statistical association and therefore 
guarantees the property of generality. Because MIT uses the 
parents VYt as conditions, it also fulfills the property of lag- 
specific causality in that it is non-zero only for lagged pro- 
cesses that are not independent conditional on X^ . 

As related measures, we can also choose either one of the 
parents as a condition, which - dropping the attribute "mo- 
mentary" - leads to the information transfers ITY and ITX 



l]^^y{T)=I{X,.,;Y,\Vx._A. 



r}), 



(10) 

(11) 



ITY isolates only the source entropy of Y . Like MIT it is 
non-zero only for dependent nodes (and therefore fulfills the 
properties of generality and causality) and used in the algo- 
rithm to estimate the time series graph |[T3ll . ITX measures 
the part of source entropy in Xt_r that reaches Yt on any path 
and is, thus, not a causal measure, yet in many situations we 
might only be interested in the effect of X on F, no matter 
how this influence is mediated. For r > these three CMIs 
are related by the inequality 



rITX / \ ^ riViii' / \ ^ rii Y / \ 



7-MIT 



ITY 



(12) 



which holds under the "no sidepath"-constraint as specified in 



Sect. VI The proof is given in the appendix. The very defini- 
tion of MIT, ITY and ITX already leads to a low-dimensional 
estimation problem without arbitrary truncation parameters. 
Further, the underlying theory of time series graphs allows for 
an analytical evaluation of the properties of these measures 
as we will demonstrate in the following section. See |29 1 for 
software to compute the time series graph, MIT and related 
measures. 

To clarify, each of the CMIs introduced in the preceding 
sections are intended to measure a different aspect of the cou- 
pling between X and Y. In the following analytical analysis 
of simple models we will discuss the interpretability of the 
different measures. 



V. ANALYTICAL COMPARISON 

To motivate our choice of a measure of coupling strength 
and to clarify the important coupling strength autonomy prop- 
erty, we discuss an analytically tractable model of a multivari- 
ate Gaussian process: 



Zt = cxzXt-i - 
Xt = axXt-i 4- 

= CXYXt-2 



Vt 



Vt 

^CwYWt 



t-i 



Vt 



Wt 



Vt 



w 



(13) 



with independent Gaussian white noise processes r]t with vari- 
ances cr^ . The corresponding time series graph is depicted in 
Fig. [T] and the parents are VYt — {Xt-2^Wt-i} and Vxt-2 — 
{Xt-s}. Generally, the conditional entropy H{Y\Z)of3. Dy- 
dimensional Gaussian process Y conditional on a (possibly 
multivariate) process Z is given by 



H{Y\Z) = \\n (^(2^e)^-^ 



(14) 



where iFyzl is the determinant of the covariance matrix of 
(F, Z). In our case Y is univariate and thus Dy = 1. The 
variances and covariances needed to evaluate the determinants 
and detailed derivations for the following formulas are given 
in the appendix. 

First, we analyze TE given by Eq. ([T]). TE can be written as 
the difference of conditional entropies 

J™ v = H{Yt I X,-\X,-) - H{Yt I X,-), (15) 

where the latter entropy, conditioned on the whole infinite 
past, is actually the source entropy of Y and can be much 
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easier computed by exploiting the Markov property 

H{Yt\X^) = HiYt\VY,), 
which yields, using Eq. ([14]), 



^ V \^Xt-2,Wt-l\ 

= ^ In (27recry) . 



(16) 



The source entropy of Y is therefore given by the entropy of 
the innovation term r]^ . In the first entropy term, on the other 
hand, the infinite vector cannot be treated that easily and we 
have to evaluate the determinants of infinite dimensional ma- 
trices in 

H{Y,\Yf,Wi,Z-) = l\n f2^e 'y;^-^-"^-"^-"' ) . (18) 

^ V \^Y-W-Z-\ J 

However, for the special case of cxz = cwv = 0, i.e., no 
input processes apart from the autodependency in X, the quo- 
tient of these matrices can be simplified to the quotient of infi- 
nite Toeplitz matrices. As shown in the appendix, we can then 
apply Szego's theorem ||2T][22l and get 



7-TE cxz=cwY=0 



1 



In 1 



(4y^l)/(i- 



4) 



Y 



(19) 



Another tractable case is ax 
covariance matrix T 



for which the blocks of the 
^ become diagonal and 



T-TE 
^X^Y 



ax=0 



- In 1 



-XY^X^Z 



^Yi^XZ^ 



X 



(20) 



Thus, in the first case the value of TE for our model depends 
on the autodependency coefficient and in the second case on 
the coupling coefficient and variance of Z. But why should 
a measure of coupling strength between X and Y depend on 
internal dynamics of X and, even more so, on the interaction 
of X with another process Z? While it can be information- 
theoretically explained, it seems rather unintuitive for a mea- 
sure of coupling strength between X and Y. 

Next, we compute the CMI /^^y l^^l defines links in a 
time series graph. Writing Eq. Q for r = 2 as a difference 
of conditional entropies, the second term is again the source 
entropy as given by Eq. ([17]) and in this case also the first 
entropy can be simplified using the Markov property 



H{Yt\X-\Xt-2)=H{Yt\X.^ 



\{Xt-2}) 

(21) 



to arrive at a finite covariance matrix from which a lengthy 
computation yields 



LINK 

X^Y 



^XY^X^Z 



2 ^ "^Y(4z^i + (1+4)4) 



(22) 



Again, also this measure of coupling strength depends on the 
coefficients belonging to other coupling and autodependency 
links. 

We now turn to the measures that solely use the parents as 
conditions which has the analytical and numerical advantage 





(17) a) sidepath 



b) nonlinear coupling 



Figure 3: (Color online) Two examples of couplings that cannot be 
related to one single coefficient cxy- Black dots mark Xt-r and 
Yt, the red and blue boxes their parents, (a) A sidepath, i.e., if there 
exists a path from Xt-2 to some parent of Yt. Then the coupling 
cannot be related to one single link, but additionally to the path via 
Wt-i. (b) Visualization of a nonlinear coupling between Xt-i and 
Yt. In this case the entropies of Xt-i and its parents "mix" and the 
coupling should be considered as emanating from {Xt-i,Vxt-i) 
rather than Xt-i alone. 



of low dimensional computations. The resulting expressions 
for the CMI with no conditions, i.e., the mutual information 
(MI), and for either one of the parents as a condition for r = 2 
are 



tMI 
^X^Y 



rITY 



rITX 
^X^Y 



In 1 



(c; 



XY^ 



X 



)/(i- 



-WY^W 



In 1 



(c; 



2 

XY 



X 



)/(i- 



In 1 



(23) 
(24) 
(25) 



Thus MI depends on the coefficients and variances of the in- 
put processes, while ITX and ITY still depend at least on the 
coefficient and variance of the process that is not conditioned 
on. Contrary to TE and LINK though, neither of the three 
measures depends on the interaction with Z. In our model the 
inputs to X and Y, i.e., the autodependency with Xts and 
the external input from Wt-i, are independent which makes 
the formulas much simpler. 

Finally, the MIT f or r = 2 is 



rMIT 
^X^Y 



1 



In 1 



-XY^X 



(26) 



which solely depends on the model coefficients that govern the 
source entropies, i.e., the variances cr|^, dy, and the coupling 
coefficient cxy- 

This equation can be proven to hold for arbitrary multivari- 
ate linear autoregressive processes under the "no sidepath"- 
constraint specified in the next section. More generally, for 
a class of additive models MIT depends only on the coupling 
coefficient cxy and the source variances of r]^ and r]^ as will 
be proven in the coupling strength autonomy theorem in the 
next section. 

But can a coupling strength always be associated with only 
one coupling coefficient c^y? In the following - still linear - 
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example model visualized in Fig.[3ja) this is not the case: 



Wt cxwXt-1 
Yt cxYXt-2 + cwyWt-i 



w 



Vt 



(27) 



where the influence of Xt-2 on Yt has two paths: One via 
the direct coupling link Yt' and one via the path 

''Xt-2 Wt-i ^ Yt' such that we can rewrite 

Yt = cxYXt-2 + cwY{cxwXt-2 + Vt^i) + vY i (28) 

from which we see, that the coupling cannot be unambigu- 
ously related to one coefficient. Here, MIT at r = 2 is 



rMIT 



In 1 



Y^^XW^^X 



(29) 



and depends not only on cxy, but also on the coefficient cxw 
of the link "Xt-2 Wt-i\ and on the variance of W^. In 
this case it might be more appropriate to "leave open" both 
paths and exclude Wt-i from the conditions which - only in 
this case - reduces the modified MIT to the MI 



I{Xt. 



-2;Yt) = \\n 



{cxY + cxwCwy) CTx 



^WY^W 



(30) 



Here the sum c^y + cxw^wy is the co variance along both 
paths, which can also vanish for cxy = —cxw^wy, and 
seems like a more appropriate representation of the coupling 
between Xt-2 and Yt. 

Another example where one cannot unambiguously relate 
the coupling strength to one coefficient is for a nonlinear de- 
pendency between X and Y [Fig.[3jb)]: 



Zt = vf 

Xt = czxZt- 



■Vt 



Yt = cxY{Xt-if+ii. 



(31) 



If we express Yt explicitly in terms of the source variance of 
X and the parent of X 



Yt = cxYC%x{Zt-2Y + 2czxCxYZt-2r]^-i- 
+ cxy(r/i^i)'+7?r, 



(32) 



VI. COUPLING STRENGTH AUTONOMY THEOREM 
AND MODIFICATIONS OF MIT 

Let X, F be two subprocesses of some multivariate sta- 
tionary discrete-time process X sufficing condition (S) in f20l 



with time series graph G as defined in Sect. [Ill| and coupling 
link "Xt-T Ft" for r > 0. The following derivations also 
hold for more than one link at lags r' ^ r between X and Y . 
As before, we denote their parents Py^ and Vxt • For the link 
"Xt-r F^" we define the following conditions: 

1. Additivity means that the dependence of Xt on its 
source process r]^ and parents Vxt and of Yt on its 
source process riJ , Xt-r and the remaining parents 
VYt \ {-^^-r} is additive, i.e., they can be written as 

Xt=gx{Vx,)^il^ (33) 

Yt = f{Xt-r) + gY{VY, \ {Xt-r}) + vT (34) 

for possibly multivariate random variables Vxt and 
VYt \ {Xt-r}^ univariate i.i.d. random variables r]^ 
and T]^ with arbitrary, not necessarily identical distri- 
butions, and arbitrary functions ^y, gxj f- 

2. Linearity inf: The dependence of Yt on Xt-r is linear, 
i.e., f{x) = cx with real c. 

3. ''No sidepath'' -constraint, i.e., in the time series graph 
G the node Xt-r is separated from {VYt \ T^Xt—r) \ 
{Xt-r} given Vxt-^ (for a formal definition of paths 
and separation see 120 1). Since due to condition (S) in 
1201 separation implies conditional independence 

liiVvArx,..) \ {Xt.,}-Xt.r\Vx,..) = 0. (35) 

Theorem (Coupling Strength Autonomy). MIT defined 
in Eq. (|7| for the coupling link "Xt-r It" for r > 
of a multivariate stationary discrete-time process X sufficing 
condition (S) in |20| has the following dependency properties: 

1. If all three conditions (l)-(3) hold, then MIT can be ex- 
pressed as an MI of the source processes: 



rMIT 



Vt 



(36) 



we note that due to the term 2czxCxYZt-2W-i effect of 
Zt-2 is not additively separable from the source process r]f_i. 
In the Venn diagram of Fig.[2ja) this "mixing" of entropies im- 
plies that the parts of the entropies H{X\Vx ) and H{Vx ) that 
overlap with H(Y) are not distinguishable anymore, which 
could be visualized by the red and light gray shadings bleed- 
ing into one another. Therefore the coupling should be con- 
sidered as emanating from (Xt-i^Vxt-i) rather than Xt-i 
alone [visualized by a thick arrow in Fig.[3]^b)]. For this non- 
linear model we have not found an analytical expression for 
MIT, but the more general case of this model is studied nu- 
merically in the appendix. 

These two examples point to constraints under which full 
coupling strength autonomy can be reached. In the next sec- 
tion we will formalize these constraints to general conditions 
in a theorem of coupling strength autonomy. 



Since r]Y and r]f_^ are assumed to be independent, the 
probability density of their sum is given by their convo- 
lution. The MIT thus depends solely on c and the joint 
and marginal distributions of r]^_^ and the convolution 
of cr]Y with r]f_^. 

If only conditions (1) and (2) hold, i.e., there exists a 
sidepath between Xt-r and some nodes in VYt YPxt-^ ' 
then MIT depends additionally on the distributions of at 
least the "sidepath-parents" in VYt and their functional 
dependence on Yt : 



nrit-r.cqt- 



\VY\{Xt.^}). (37) 



This relation can be further simplified if gY{VYt \ 
{Xt-r}) is additive in some parents. 



7 



3. If only the additivity condition (1) holds, i.e., f{x) is 
nonlinear and mixes r]f_^ with the parents Vxt-^ then 
MIT depends additionally on /, the distributions of 
variables in Vxt-^ well as VYtXi^t-r} and their 
functional dependencies on Yt : 

iTJyir) = 

= /(7?f_,; /(7?f_, + gxiVx,.^)) + I 

I rvAiXt-r}, (38) 

This relation can be further simplified if some parents in 
Vy, \{Xt-r} are independent of f{r]f_^^gx J ) • 

For a contemporaneous link "X^ — Yt' the contemporaneous 
MIT defined in Eq. ^ under the condition (1) is: 

iT-Y = I{vf■,vI\^xMYt},^fYMXt})■ m 

A contemporaneous link cannot have sidepaths. For X = Y 
MIT measures the autodependency strength. The proofs are 
given in the appendix. 

We now discuss some remarks on the theorem and possible 
modifications of MIT: 

i) For the special case of multivariate linear autoregressive 
processes of order p l|23ll defined by 

p 

X, = ^ ^{s)Xt-s + su St ^ Ar(0, S), (40) 

with the coupling coefficient cxv at lag r corresponding 
to the connectivity matrix entry ^(r)yx, and with no 



sidepaths, Eq. (36) leads to 



1 



In 1 



-XY^X 



(41) 



generalizing the MIT for our analytical model in Eq. ( |26| ). 
For an autodependency at lag r with coefficient ay and 
no sidepaths the MIT is /y^y(r) = ^ In (l + a|^), in- 
dependent of the source variance ay • 



ii) The form Eq. (41 ) is reminiscent of the Shannon-Hartley 
theorem in communication theory | 4 |. There the coupling 
strength corresponds to the communication channel ca- 
pacity C which is given by the maximum MI over all 
possible input sources: C = max|p(x)} ^(-^; The 
Shannon-Hartley theorem for Gaussian channels then 
reads 



C = B\o^ 1 



5; 
TV 



(42) 



with bandwidth B and signal-to-noise ratio S/N, which 
in Eq. (41 ) corresponds to cj^Y^x/^Y- The difference to 
our measure of coupling strength is that we cannot manip- 
ulate the input sources and thus cannot measure the chan- 
nel capacity alone. We also expressed the various other 
CMIs occuring above in this form, where the quotient can 
be interpreted as a signal-to-noise ratio. For example, in 



iii) For sidepaths, i.e., under the conditions (1) and (2) only, 
the example of MIT and the modified MIT for the case of 



our model example Eq. ( 27 ) point to the suggestion, that 
it might be more appropriate to "leave open" all paths 
from Xt_r to Yt by excluding those parents of Yt that are 
depending on X^-^-, i.e., 

{I^tr. ^ T^yAT^x,_. : I{Wt'_,^'.Xt-r\Vx,_A > 0}, 

(43) 

but additionally including the parents ViVyJ of these 
sidepath parents. In this way the couplings via the direct 
link "Xt-r Yt'' and the path "Xt-r _ V^^ Yt" 
(the symbol denotes that the link from Xt-r to the 
sidepath parents can either be directed or contemporane- 
ous) are isolated from the effects of their parents. The 
modified MIT we call MITS where "S" stands for "side- 
path": 

iT^Hr) ^ I{Xt-r.Yt I VY\{Vl.Xt-r}. 

V{Vl)\{Xt-r}.Vx._.). (44) 

iv) For nonlinear dependencies / one could modify MIT to 
the CMI between Yt and the joint vector (Xt_^, Vxt-^) 
leading to MITN where "N" stands for "nonlinear": 

/™(r) ^ I{{Xt-r. Vx,_.);Yt I VYA{Xt-r.Vx,_.))- 

(45) 

These modifications will be studied in a separate paper. 

The theorem implies that under the conditions (l)-(3) the 
MIT is independent of other coefficients belonging to other 
links. If this holds for all coupling strengths of all links in 
the model, then the MITs are independent in a functional 
sense. Note, however, that all coupling strengths of links em- 
anating from the same process X will depend on the source 
variance of r]^ . Thus, MIT somewhat disentangles the cou- 
pling structure, which is exactly the coupling strength au- 
tonomy that makes MIT well interpretable as a measure that 
solely depends on the "coupling mechanism" between X at 
lag t — r and Yt, autonomous of other processes. One such 
possible misleading input "filtered out" by MIT is autocorre- 
lation, or, more generally, autodependency as will be shown 
in the numerical experiments and the application to climato- 
logical data. In the next section we investigate the coupling 
strength autonomy property numerically. 



VII. NUMERICAL COMPARISON 

In the following we compare MI, TE, MIT and related mea- 
sures numerically to investigate the properties of generality 
and coupling strength autonomy for a general class of nonlin- 
ear discrete-time stochastic multivariate processes: 



Zt = azZt-i^r]f 

Xt = axXt-i + czx g{Zt-i) 



■Vt 



X 



Yt = aYYt-i + cwY giWt-i) + cxY /(^t-2) + r]J 
Wt = awWt-i -^tjI 



w 



(46) 



Eq. (25 ) c^Y^x the signal strength and c^yy^w + ^ith independent Gaussian white noise processes % with all 

is the noise strength. variances = 1. The corresponding time series graph is 
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depicted in Fig.[2jb). We estimate the various coupling mea- 
sures for fixed cxy and az = aw = 0.5 and vary the input 
coefficients 

ax=czx e {0.0, 0.1, ...,0.8} 
ay = cwY e {0.0, 0.1, 0.8} 

and functional dependencies of inputs 

linear g{x) = x, 

squared g{x) = 0.3 • x^, 

stochastic g{x) = 2x£t with uniform i.i.d. £t G [0,1], 

exponential g{x) = 0.3 • 2^, 

sinusoidal g{x) = sin4x. 

Here we depict results for linear f{x) = x such that the mul- 
tivariate process suffices all three conditions, a nonlinear de- 
pendency type is discussed in the appendix. The ensemble 
E then consists of all combinations of input coefficients and 
functional forms, each combination run with 120 trials. The 
CMIs are estimated using a nearest-neighbor (/cNN) estimator 
1251 l26l with parameter k = 1 (small values of k lead to a 
lower estimation bias but higher variance 1 25 , 26 1). 

In the top panel of Fig. |4ja) we plot the ensemble average 



MIT 



ITX 



ITY 



MI 



/ ) for fixed cxy = 0.6 for the following measures with 

/ E 

r = 2: MI I{Xt-r] Yt) (gray with dotted line), ITY according 
to E g. (p^ (green with dash-dotted line), ITX according to 
Eq. \\\\ (blue with dashed line) and MIT according to Eq. ^ 
(red with solid line). The parents are shown in Fig.[2jb). 

MIT is largely invariant to changes of the remaining coeffi- 
cients and g{x) and approximately attains the analytical value 
for zero input coefficients [given by Eq. ([26]) for c^y = 0.6 
and cr| = erf. = 1]: / ^ 0.15. This implies that the MIT of 
the coupling link is autonomous of the MITs corresponding 
to the input Hnks Z->X for Z e Vx and W^Y for W G 
Vy\{X} which scale with these coefficients. Note, however, 
that all coupling strengths of links emanating from the same 
process will depend on its variance cr^ like in Eq. ([26]). Fur- 
ther, MI is mostly larger, but can also be smaller than MIT, 
which can be explained with the entropy diagram in Fig.[2];a): 
larger Mis occur if the entropy is increased due to a larger in- 
put of H{Vx ) and smaller Mis occur if the relative shared part 
of H{X) in H{Y) decreases due to a larger input of H{Vy). 
For zero inputs, MI approaches the analytical value / ^ 0.15 
where all four measures converge to. ITY can at least exclude 
input to Y and ITX can exclude input to X. Note, however, 
that the dependence of ITX and ITY on the input coefficients 
can be different in other models. The average of ITX (ITY) 
is always smaller (larger) equal than MIT confirming the in- 



equality Eq. (12). 

In the bottom panel of Fig.[4];a) we compare MIT (red with 
solid line) to TE according to Eq. ([2]) truncated at Tmax = 4 



(gray with dotted line), the CMI I^^y defining links in the 
time series graph according to Eq. ^ truncated at Tmax = 4 
(green with dash-dotted line), and DTE according to Eq. ([3]) 
with = 3 (blue with dashed line). TE and LINK have a 
much larger estimation dimension of 17 (as much as 25 for 
Tmax = 6) compared to 6 for MIT and between 5 and 12 
for the summands of DTE. Compared to DTE this leads to a 
negative relative bias in TE of about 50% for the analytically 
known value for zero input coefficients / ^ 0.15. Apart from 
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Figure 4: (Color online) Numerical experiments with the model 
Eq. ( [46| using time series length T = 1000. In (a) we plot the ensem- 
ble average (I)^ for fixed cxy = 0.6 for all measures as specified in 
the main text. In (b) we show the ensemble densities of all measures 
for different coupling coefficients cxv = 0.0, 0.3, 0.6, 0.9 (from 
left to right red, yellow, green and blue solid lines). The densities are 
estimated using Gaussian kernel smoothing according to Scott's rule, 
showing only the 90% most probable ensemble members. 



this bias, TE and DTE scale similarly with the input coeffi- 
cients. LINK is dependent on ax as we expect from our ana- 



lytical considerations [Eq. ( [221 )] • The MIT shows some slight 
dependence for strong inputs due to estimation problems for 
short samples, but otherwise also numerically we demonstrate 
here that only MIT fulfills the proposed property of coupling 
strength autonomy. 

In Fig. ^h) we show the whole densities of E of all mea- 
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sures for different coupling coefficients cxy- The aim of this 
experiment is to measure how well the measures can distin- 
guish the coupling strength for different cxv as demanded by 
the property of equitability. The dashed lines show the densi- 
ties of the ensemble for ax = czx = ay = cwy = 0, i.e., if 
both X and Y are independent of their parents. 

As we now already expect, MI takes a whole range of val- 
ues for the same cxy • ITY is broadly peaked towards higher / 
values and ITX towards lower values, confirming the inequal- 
ity Eq. (p^. Note, that this relation holds only on average. 



Only with MIT the different coupling coefficients cxv can be 
well distinguished. DTE tends to slightly higher values for 
larger autodependencies within X as expected from our ana- 
lytical results. Additionally, the variance of the DTE estimate 
is higher because each summand's variance adds up to the to- 
tal variance of the DTE estimate. The remaining four plots 
demonstrate that TE and the CMI of Eq. ^ strongly suffer 
from the negative bias associated with high dimensional es- 
timation depending on the chosen Tmax- TEs or LINKs es- 
timated with different r^ax can, therefore, not be compared 
with each other. 

For the 'unperturbed' case of zero inputs, the ensemble dis- 
tributions of MI [dashed lines in Fig. |4jb)] are - as expected 
- similar to the one for MIT with "conditioned-out" inputs 
(solid lines) apart from a small bias and smaller variance re- 
lated to slightly higher dimensional estimation. For condition- 
ally independent variables (cxy = 0, red lines), all measures 
have almost no bias, i.e., / ^ 0, which is a property of the 
/cNN estimator and holds also for short samples ll25ll . It may 
seem that apart from the bias, at least the variance is much 
smaller for the high dimensional measures TE and LINK, but 

the relative variance ^^^^ / actually increases leading to 

a worsened distinguishability. 

Summarizing, our experiments provide numerical evidence 
that MIT acts as an information-theoretic "filter" that excludes 
undesired effects of autodependency or other misleading in- 
puts. The MIT is, thus, specific only to the interaction of the 
two lagged subprocesses and can disentangle the measured 
coupling strengths of the different links in a time series graph. 
The commonly used measures MI and TE, on the other hand, 
are possibly affected also by the interactions that X and Y 
have with other processes. In this respect MIT is more in- 
tuitive and better interpretable than TE or MI. The coupling 
strength autonomy property can, thus, be regarded as one in- 
gredient of a multivariate extension of the equitability prop- 
erty. 



in the time series graph. We will, therefore, not be able 
to access the source entropy solely at time t, but only the 
aggregated information in the interval [t — As^t]. But for 
discrete processes graphical models are applicable to the 
large class of models sufficing condition (S) in |20|. 

iii) Although the graphical model approach reduces the esti- 
mation dimension to a minimum, the dimension can still 
be relatively high leading to biased estimates for shorter 
samples. A study on the effects of high dimensional es- 
timation is subject to further research. Generally, there 
are problems with entropy estimation for highly skewed 
distributions which need to be resolved by improved esti- 
mators of CMI. 

iv) Our two-step approach first necessitates the estimation of 
the time series graph which comes with the associated 
problems of false positive detections due to multiple test- 
ing and missed causal links. These problems are analyzed 
in the Supplementary Material in [i l3l . 

v) As discussed in the coupling strength autonomy theorem, 
not in all cases a coupling strength can be attributed to 
only one single coefficient. Only if this is the case, i.e., 
under the conditions (l)-(3), MIT can filter out all influ- 
ences from the parents of X and Y. If the dependency is 
nonlinear or sidepaths exist, one could use modifications 
of MIT like ^1 [Eq. ^] and I^™ [Eq. ^] for 
a more appropriate measure of coupling strength. Note, 
that even so for full coupling strength autonomy the link 
"Xt-r Yt' needs to be linear, the remaining depen- 
dencies can still be nonlinear and the source processes 
can have arbitrary distributions. The process can, there- 
fore, not easily be estimated using model-based regres- 
sions. 

vi) Regarding equitability, a desired property of a coupling 
measure would be that it scales linearly with the coupling 
parameter cxv like the partial correlation approximately 
in the Gaussian case. As can be seen from the analytical 
derivations and the numerical example in Fig.|4jb), MIT 
scales (X ln(l + cxv ■ ■ ■) for Gaussian dependencies, but 
a linear scaling i n this case can be attained by the trans- 
formation / Vl — e~^^|4|. For more complex depen- 
dencies improved estimators that are more adapted to the 
distributions might help. 



IX. APPLICATION TO CLIMATOLOGICAL TIME SERIES 



VIII. DISCUSSION AND LIMITATIONS 

Let us here discuss some limitations of our approach: 

i) Our notion of causality is to be understood only with re- 
spect to the observable processes included in the parents, 
while the general notion of causality | 24 | requires to ex- 
clude the influence of the whole universe. 

ii) The graphical model imposes a discrete description of 
causal interactions. Regarding the source entropy, we 
face the problem that if a time-continuous process is sam- 
pled at some interval As, there is an infinite set of unob- 
served nodes in between every Xt and Xt-i for X G X 



We now analyze monthly air temperature anomalies in the 
tropics at two different altitudes in a NCEP/NCAR reanalysis 
data set [27 1. To investigate the upwelling of heat from the 
sea surface towards the upper troposphere in a height of about 
12 km, we measure the coupling strength between the surface 
pressure level (X in Fig.|5]) and the 200 hPa pressure level (Y) 
for all tropical (latitudes between 30°S and 30°N) grid points. 

First, we estimated the time series graph using the al- 
gorithm introduced in |13| separately for each surface- 
troposphere pair at each grid point using a significance thresh- 
old estimated with the shuffle test as in HTSl . We found - on 
average - the parents Vxt = {^t-i} and Vvt = V^t-i}, i-c, 
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Figure 5: (Color online) Analysis of air temperature anomalies at 
the surface (X) and the upper troposphere (Y), T = 1008 months 
(1927-2011). (a) shows the spatial average and standard deviation 
of coupling (left plot) and autodependency (middle plot for X, right 
plot for Y) lag functions for MI (dashed lines in light colors) and 
MIT (solid lines in dark colors). In (b) we spatially resolve the cou- 
pling strengths of the contemporaneous link "Xt — Ft" and the au- 
todependency ''Xt-i Xt' for MI (upper two panels) and MIT 
(lower two panels). I^I^y and I^Y^y (not shown) are almost the 
same all across the tropics. For the contemporaneous link values be- 
low the 98% significance level are in white. CMIs estimated with 
k - 10. 



todependency in surface air temperatures is apparent (^x^x)- 
This strong persistence thus leads to a spurious increase in MI, 
which cannot differentiate the effects of increased autodepen- 
dencies and increased contemporaneous coupling like MIT. 
With our measure of coupling strength we are, thus, able to 
infer a more reasonable picture of the physical interactions in 
the Walker circulation. This preliminary example underlines 
the importance of having a meaningfully interpretable cou- 
pling measure. 



X. CONCLUSIONS 

To conclude, we have analytically and numerically shown 
that the commonly used measures MI and TE can be rather un- 
intuitive as measures of coupling strength. To overcome this 
limitation, we propose a two-step approach, where in the first 
step the existence of lag-specific couplings, i.e., the causal 
links, and contemporaneous links in a multivariate process are 
determined as discussed in |[T3ll . For the second step addressed 
in the present article, we have generalized the information- 
theoretic MIT as a lag- specific measure that has a property 
which we call coupling strength autonomy. It allows for a 
well interpretable coupling strength reminiscent of an exper- 
imentally manipulable setting. As we prove analytically and 
numerically, the coupling strength autonomy property is use- 
ful for models of processes where the coupling strength can 
be attributed to one single coefficient, while for other cases 
we suggest modifications of MIT as more appropriate mea- 
sures. Compared to TE, our MIT has the advantage of being 
practically computable without the need for arbitrary trunca- 
tions. Besides our example from climatology, also in other 
fields of science our two-step approach promises to not only 
extract the causal direct (rather than the indirect) connectiv- 
ity among processes, but also to assess a meaningful coupling 
strength, that - together with the coupling delay - assists a 
physical interpretation. 



lag-1 autodependencies, and the contemporaneous link "X^- 
- Yt'\ 

With these parents, the spatial average of all lag functions 
of MIT in the left panel of Fig.[5|a) shows the contemporane- 
ous link "X^ — Ft" as a significant peak, indicating that the 
time scale of the coupling is below the lag of one month. The 
MI, on the other hand, is significant for a wide range of lags, 
making an assessment of a physical coupling delay difficult. 
While the contemporaneous link cannot be interpreted as a di- 
rected coupling, we can still assess its strength. The MIT of 
a linear Gaussian process with the same time series graph is 

^T-Y = \ log (^j^l^l^) , while MI additionally depends 
on the autodependency coefficients. 

Figure [5jb) shows a large (compared to the extra tropics) 
^x-y ^ across the tropics. Significant /^-^^ values, on the 
other hand, are more confined and largest between 90° E and 
170°W. Larger MIT values indicate a stronger coupling be- 
tween the surface and upper tropospheric level in an area that 
actually corresponds to a region of strong upwelling in the 
Walker circulation ||28]| . The difference between MI and MIT 
is largest in the Eastern Pacific where also the increased au- 
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Appendix 

Here we give the proofs of the inequality relation between 
MIT, ITX and ITY in Eq. ([12]), the coupling strength auton- 
omy theorem and further discussions regarding the property 
of coupling strength autonomy for processes violating the lin- 
earity condition (2). 
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I. PROOF OF INEQUALITY RELATION EQ. (12} 



II. DERIVATIONS FOR ANALYTICAL MODEL EQ. (B) 



The MIT /f^^ = I{Xt_r'.Yt\T^YA{Xt-r}.Vx,_.) be- 
tween two uni- or multivariate subcomponents X, F of a sta- 
tionary multivariate discrete-time stochastic process X with 
time series graph G and parents V as defined in the main ar- 
ticle, is bounded by the two CMIs with condition on either 
parents [Eq. ([12])] 

I{Xt-r:Y,\Vx,_.) < ITJy < I{Xt-r;Yt\VYMXt-r}). 

(Al) 

where r > 0. The right inequality holds for all processes 
sufficing the very general condition (S) in |20| and the left 
inequality if additionally the "no sidepath" -constraint for the 
coupling ''Xt-r Yt' holds, that is, if Xt_^ is separated 
from Vxt—r \ by its parents Vxt-^ in the time series 
graph. For a definition of separation see |20|. 

To prove the right inequality, let Vxt-r be the set of parents 
of Xt-T that is not already included in Vvt^ i-^-. ^Xt-r = 
Vxt-. \ Then it holds that I(Vxt-.]Yt\VY,) = be- 
cause the parents Vvt separate Yt from any subset of \ 
Vvt and separation in the time series graph implies condi- 
tional independence between the subprocesses [20, Thm. 4.1]. 
Now we apply the chain rule on the (multivariate) CMI 
IiXt-r,Vx,_.',Yt\VY, \ {Xt-r}) twice: 

I{Xt-r.Vx._.;Yt\VY,\{Xt-r}) = 
= I{X,_r;Y,\VY, \ {Xt-r})^I{Px,_.;Yt\VY,) 

V ' 

=0 

= I{Vx._^;Yt\VYA{Xt-r})^ 

^ V ' 

^I{X,_r;Y,\VYA{Xt-r}.Vx,_.) 

=^ I{Xt-r;Yt\VYMXt-r}.Vx,_.) 
<I{Xt_r;Yt\VYA{Xt-r})- 

Note, that (conditional) mutual information is always non- 
negative. 

For the left inequality we now define VYt to be the set of 
parents of Yt that is not already included in Vxt-r^ i-^-' ^Yt = 
T^Yt \ T^Xt-^ • Then under the "no sidepath"-constraint it holds 
that I{VYt \ {Xt-r}] Xt-r\Vxt-r) = 0- Note, that all paths 
emanating from Xf^^ towards the past are surely blocked by 
Vxt--r because they contain the motifs "^ Zt-r' Xt-r' 
or "— Z^-r' Xt-r' which are both blocked as Zt-r' ^ 
Vxt-^- The "no sidepath"-constraint further demands that 
there are no unblocked paths to VYt emanating towards the 
present or future. Again, we apply the chain rule on the (mul- 
tivariate) CMI I{Xt-r] Yt, VYt \ {Xt-r}\Vxt-A twice: 

I{Xt-r'.Yt,VYA{Xt-r}\Vxt-A = 

= I{Xt-r-MVXt-A^I{Xt-r-.VYt\{Xt-r}\Vxt-..Yt) 



>0 



I{VYA{Xt-ry.Xt-r\Vxt-A^ 

V ' 

=0 

^ I{Xt-r;Yt\PYA{Xt-r},Vx,.^) 

>/(X,_,;yt|Px._J. 



Defining variances and covariances by 
for model Eq. ( [T3] l the variances are 



(A2) 



Tz = Cxz^X + cr| 

Ty = CxY^X + C^Y^W + O-y. 

Further, auto-covariances are 

^xx{r) = a^pTx 
ryy(r) = CxY^xx{r) 
^zz{r) = cxz^xx{r) 
Tww{t) = 0, 

with Txx{^ = 0) = Tx- The covariances for r > are 
given by 



ryx(r) 


= cxY^xx{r - 2) 




rxy(r) 


= axCxY^xx{r + 


1) 


Tzxir) 


= cxz^xx{r - 1) 




Txzir) 


= cixcxz^xx{r) 




^xw{r) 


= rwx{r) = 




rzy(r) 


= CxYCxZ^Xxi^ - 


M) 


ryz(r) 


= cxYCxz^xx{r - 


-1) 


^zw{r) 


= rwz{r) = 




^Yw{r) 


cwY^ir - l)rw 




^Yw{r) 


= 0, 





with the Kronecker-Delta (^(s) = 1 for 5 = and S = else. 
These covariances form the entries of the covariance matrices 
that are needed to compute the conditional entropies. 



A. Derivations of TE 

For the derivation of TE 
7™ y = HiYt\Yf,Wr,Zr) - HiYt\X,-Yf,Wf,Z,-) 



we know from Markov properties that the latter term is the 
source entropy H{Yt\VYt) 
tropy 



^ In 27recry . For the first en- 



H{Yt\Yf,wr,zr 



1, \^YtY-wrz-\\ 
^In 2^e ' ' ' \ ] (A3) 



we can write the covariance as a block matrix 



■YtYrw-Z7 



^Yt 

F^ 



'yt;Y- 



F^ 

Y--Z- W-,Z- 



^y^z- 

n 

^yt~-^z- 

F.- 



(A4) 
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where, e.g., is an infinite vector with entries of the 

covariances of Yf with Wt-i, • • • and 



The quotient in Eq. ( |A3| ) of these infinite dimensional matrices 
is difficult if not impossible to evaluate in the general case. 
Here, we will only consider two simple cases. 



1. CXZ — CWY — 



entries of the Toeplitz matrix being the coefficients 

oo oo 

/(A)= 5xe*^'=ry+2^5.e-^ 

T— — 00 T—1 

cry +2,^ 12-2^^ 



^XY 



'X 



l-al 



1-a 



X 



X 



(All) 
(A12) 



(l-ai)(l-axe^A) 



(A13) 



For the case of cxz = cwy = 0, i.e., as inputs solely an 
autodependency in X, the co variance matrix takes the simple 
form 



YtY-wrz- 







V 



^yf.Y- 








\ 






























/ 



(A5) 



where the top left block is an infinite dimensional Toeplitz 
matrix, i.e., a Toeplitz operator. Then the quotient in Eq. ( |A3| ) 
can be simplified to 



\^YtY-\\^W-Z; 



(A6) 



Ty^y- ^Y~ symmetric Toeplitz matrices Gr and 

Gr-i with diagonal elements Ty and off-diagonal elements 
9t 



90 



Y Cxy- 



'X 



■4 



2 2 

|r| <^xy^x 



l-ai 



for r > 1. 



(A7) 
(A8) 



The desired TE is then given by 



with a < /3 for \ax \ < 1. Then the TE is 



Il\y= lim Jln-^J^ 

r^oo 2 CTy |Gr-l| 



lim - In ^^^^ 



1 

r^oo 2 |Gt-_i| 2 

1 

G,-i| " 2 



= - In lim 

2 r^oo 



In CTy 
In CTy 



- in exp 



27r 



27r 



ln/(A)(iA 
^- j'J \nf{X)d\-Uncjly 



47r 
1 

47r 



(A14) 
(A15) 
In crf^ (A16) 
(A17) 



27r /'27r 

ln(ae^^ + /3)(iA-ln(l-a^) / d\ 
Jo 



/>27r 

/ In (1 - axe'^) (iA 



Incr?^, (A18) 



where the integrals (^) and (^^) can be evaluated using con- 
tour integration to 



tTE 



lim - In ^ 



1^1 



\Gr- 



(A9) 



To obtain the limit of the ratio of Toeplitz matrices we can 
utilize Szego's theorem [ 21 , 22 | which relates the limit to the 
geometric mean of a function /(A) 



lim 



\GAf)\ 



exp (^-^ In f{X)d\y (AlO) 



which requires that the Toeplitz matrix is in the Wiener class, 
i.e. the entries must be absolutely summable, which we as- 
sume here. The function /(A) is the Fourier series with the 



W = 27rln/3 = 27rln (c^ycr| +cr|^(l-a^)) for a < /3, 

(A19) 

for ax < 1 . 
(A20) 



(^) = 27r In 1 = 



The TE is thus 



rTE 



^X^Y 



- In (^1 + ^ ^^J^ ^ ^ j (A21) 



and depends on the autodependency strength of X. 
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2. ax = 



B. MIT and related measures 



Now the process "decouples in time" since no autodepen- 
dencies are present. The covariance matrix is 



YtY-W-Z- 



















(A22) 



with the blocks being 
r 



2 2 



2 2 



4 



(Ciyycr^,0,0, . . .) 

(cxycxzcri,0,0, . . .) 

cwyctw^ 
cxYCxzcr\^ 

,2 _2 



where I is the identity matrix and S is the shift matrix with 
ones on the superdiagonal, i.e., the first upper off-diagonal, 
and zeros everywhere else. The quotient in Eq. ( |A3| ) can 
be simplified by expressing the block matrix in terms of the 
Schur complement of the covariance block Ty- y^- ^- 

\^YtYrwrz7\ 



Yrwrz: 



(r 



Yt;Y-^^Yt;W-^^Yt;Z, 



-)(r 



yrwrz^ 




Since the vector (Fy .y- , Fy^.^- , Ty ) contains only two 
non-zero elements, we do not have to take the infinite limit 
and do not need to invert the whole matrix Fy-^-^-. A 
simple calculation yields 

l^y^y-w-zrl 



Y-w-z: 



_ 2 2 
— (^WY^W 



2 2 
^XY^X 



4 



from which we get 
1 



rTE 



In 1 



-WY^W 



'W 



-XY^X^Z 



^XY'-XZ^X 



^XZ^X 



YV-XZ"^ X 



+ 4) 



(A24) 



(A25) 



Here, the TE depends on the coupling strength of X with Z, 
which seems rather unintuitive. This formula could have also 
been derived by exploiting separation properties of the cor- 
responding time series graph (i.e., Markov properties of the 
process), from which a much smaller set of conditions can be 
inferred. 



The measures based on the parental sets are much easier to 
derive because they involve only finite and very low dimen- 
sional covariance matrices. As an example, for the entropy 
li{Yt I Wt-i, -^t-s) needed to compute the MIT, the covari- 
ance matrix of (F^, Wt-\^Xi-z) is 



^WY^W 



- CTy CwYCfw 



axCxY(^x \ 
1-al \ 



cwyc^w 

axCxY<yx 



W 





(A26) 



III. PROOF OF COUPLING STRENGTH AUTONOMY 
THEOREM 

To compute MIT, 

I^'^yir) ^ I{X,.,;Y,\VYA{Xt.,},Vx,.J 

= H{Yt\VYA{Xt-r},Vx,..) - H{Yt\VY.) 

we need the source entropy H{Yt\VYt) and the conditional 
entropy H{Yt\VY^\{Xt-r}^ Vxt-^)- For the following steps 
we firstly use the independence of the i.i.d. variables r]^_^ and 
rjj of processes in the past, i.e., /(r^^; X^) = 0, and further 
due to the data processing inequality |4| also 



/(7?f;/(X-))=0 



(A27) 



and correspondingly I{r]^_^;g{'X^_^)) = for arbitrary 
functions /, g. This implies in particular /(r?^; /(T^yJ) = 
and I{r]f_^; giVxt-r)) — 0- Secondly, we use that generally 
for random variables Y and W and an arbitrary function / 

H{Y + f(W)\W) = J p{w)H{Y + f{W)\W = w)dw 

= J p{w)H{Y\W = w)dw 

= H{Y\W), (A28) 

because f{W) for = is a fixed constant and entropies 
are translationally invariant. 

Then, for = + ^y (7^yA{^.-r}), the 

source entropy is 

H{Yt\VY,) = HifiVyJ + vYlVyJ (A29) 
= H{rjY\VY,) (A30) 
= H{riY), (A31) 

and depends only on the distribution of the source process r]Y • 
This relation holds generally if Yt additively depends on its 
parents. 

Next, to compute the other conditional entropy, we insert 
Eq. ([33]) in ^^ and get 



H{f{vf-r + 9x{rx,.J) + 9Y{rYA{Xt-r}) + vl\ 

\VYMXt-r},rx,..) 

= H{f{7il,+gx{Vx,..)) + vl\rYA{Xt-r},Vx,..) 

(A32) 
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also due to translational invariance. If we only assume condi- 
tion (1) this relation cannot be much further simplified. 
To arrive at a CMI again, we need to expand the source 



entropy using Eq. (A28) and {All). First, we add the same 
conditions as in Eq. (A32), which is possible since rjj is in- 
dependent of all past processes: 



(A33) 



Next, we insert the term f{r]f_^ + gx {T^Xt-^)) "condition 
it out again" using Eq. ( A28 ) by adding r]f_^ to the conditions 
(Vxt-T is already included): 



Then via 



H{7jI \VYMXt-r},Vx,.^) 



(A34) 



rMIT 
^X^Y 

H{f{7^}L, + 9x{rx._j) + vI I rYMXt-r},Vx...)- 

H{f{r^^_,+9x{rx._J)+vI I rYMXt-r},Vx...,vi 



X 



we arrive at Eq. ( [38) ). 

If we as sume conditions (1) and (2), we can further sim- 
plify Eq. (|A32) since f{r]f_^ + gx{Vx,_^)) = cr]f-r + 
cgx {Vxt-7T^^ therefore 



we only need condition (1) for which the entropy in the first 
term 

H{vl + 9Y (Py. ) \Vy, , Vx, , MxMYt}MYMXt}, 

V{MxMYt}lV{NYA{Xt})) (A37) 

= H{ii(\VY , Vx, , MxA{Yt}MYA{Xt}, 

V{MxMYt}lV{NYA{Xt})) (A38) 

= H{ii(WxA{Yt}MYA{Xt}), (A39) 



again due to translational invariance of entropy [Eq. (A28)] 
and the independence of r]Y of past processes [Eq. (A27)]. 
For the same reasons the entropy in the second term becomes 

HinJ + QY (Py. )\Vy, Vx, , MxMYt},MYMXt}, 

nAfxA{Yt}),nMYMXt}),Xt) 

= H{ri( \Vy, , Vx, , NxA{Yt}MYA{Xt}. 

V{MxA{Yt}), V{NYA{Xt}), + gx{Vx^) 
= H{7jY\J\fxA{Yt},MYA{Xt},vf), (A40) 

because knowing r]f + gx{Vxt) and Vxt is equivalent to 
knowing rj^ and Vxf Then Eq. (39) follows which finishes 
the proof. 

Similarly, MITS and MITN can be simplified if the depen- 
dency gY is additive in the parents. 



H{cr]t_, + cgx(Vx,_.) + \VYA{Xt-r}.Vx,_ 
= H{crjl,^rjY\VYMXt-r}) 



(A35) 



where we used Eq. (A28) and the fact that I{cr]f_^ + 
r]Y; Vxt-^ I T^YtM^t-r}) = (also holds without the con- 
dition on VYt\{Xt-T} because Vxt-r ii^s in the past of both 
r]^_^ and rfj). Extending the source entropy again we arrive 
at Eq. ([37]). If the "sidepath"-parents in 

n = {Wlr, ^ Vy\Vx._. : Wl,^) > 0} 

(A36) 

are additively separated from the remaining parents, MIT can 
be further simplified. 



If additionally condition (3) holds, then Eq. (35) leads to 
I{cr]f + r]Y ;VYt\{Xt-T}) = 0, and we, therefore, can drop 
VYt\{Xt-r} from the conditions from which Eq. (36) fol- 
lows. 

For the contemporaneous MIT 

i^'^^Y ^ HXt ; Yt\VY, , Vx. , ^xA{yt}.^Y, 

v{^xMyt}).n^YMXt})) 



IV. FURTHER NUMERICAL EXPERIMENTS 

In Fig. [6] we show results of our numerical experiments 
for the model class Eq. ( [46] ) with a nonlinear dependency 
f{x) = of the link ''Xt-2 ^t" using the same ensem- 
ble setup E as before. As discussed in Sect.[V| then the source 
process irif_^ mixes with its parents and it does not make sense 
to attribute the coupling strength to one single coefficient. As 
a result, the average of MIT in Fig. [6ja) tends to larger val- 
ues for increased ax = czx , thus the inputs are not entirely 
"filtered out". Still, MIT is much less affected than MI. 

Regarding the inequality relation Eq. ( [T2| ), a nonlinear de- 
pendency does not affect at least the right side /^^y(r) < 
^x-Iy(T) as demonstrated in Fig. ^2i) and (b). Although 
the left side of the inequality relation /j^^y (r) < /x^y(r) 
should hold under the same general condition (S) in [20] and 
the "no sidepath"-constraint, it seems to be violated for large 
dx = Czx (and small ay = cwy)- This could be related to 
highly skewed distributions for nonlinear f{x). 

In the bottom plot of Fig. [6ja) it might seem, that TE and 
LINK are less affected, but actually the relative variance is 
much higher. 
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Figure 6: (Color online) Numerical experiments with the model 
Eq. j46] ) with setup as before but for squared dependency f{x) — 
with cxY — 0.6. 
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